Correlation of Data Reconstruction Error and Shrinkages in Pair-wise Distances under Principal Component Analysis (PCA)
نویسنده
چکیده
In this ‘on-going’ work, I explore certain theoretical and empirical implications of data transformations under the PCA. In particular, I state and prove three theorems about PCA, which I paraphrase as follows: 1). PCA without discarding eigenvector rows is injective, but looses this injectivity when eigenvector rows are discarded 2). PCA without discarding eigenvector rows preserves pair-wise distances, but tends to cause pairwise distances to shrink when eigenvector rows are discarded. 3). For any pair of points, the shrinkage in pair-wise distance is bounded above by an L1 norm reconstruction error associated with the points. Clearly, 3). suggests that there might exist some correlation between shrinkages in pair-wise distances and mean square reconstruction error which is defined as the sum of those eigenvalues associated with the discarded eigenvectors. I therefore decided to perform numerical experiments to obtain the correlation between the sum of those eigenvalues and shrinkages in pair-wise distances. In addition, I have also performed some experiments to check respectively the effect of the sum of those eigenvalues and the effect of the shrinkages on classification accuracies under the PCA map. So far, I have obtained the following results on some publicly available data from the UCI Machine Learning Repository: 1). There seems to be a strong correlation between the sum of those eigenvalues associated with discarded eigenvectors and shrinkages in pair-wise distances. 2). Neither the sum of those eigenvalues nor pair-wise distances have any strong correlations with classification accuracies. 1 ar X iv :1 41 2. 67 52 v1 [ cs .L G ] 2 1 D ec 2 01 4
منابع مشابه
Quantitative principal component model for skin chromophore mapping using multi-spectral images and spatial priors
We describe a novel reconstruction algorithm based on Principal Component Analysis (PCA) applied to multi-spectral imaging data. Using numerical phantoms, based on a two layered skin model developed previously, we found analytical expressions, which convert qualitative PCA results into quantitative blood volume and oxygenation values, assuming the epidermal thickness to be known. We also evalua...
متن کاملPrincipal component analysis of CYP2C9 and CYP3A4 probe substrate/inhibitor panels.
Cytochrome P450 (P450) inhibition often occurs in a strongly substrate- and inhibitor-dependent manner, with a given inhibitor affecting the metabolism of different substrates to differing degrees and with a given substrate responding differently to different inhibitors. Traditionally, patterns of functional similarity and dissimilarity among substrates and inhibitors have been studied using cl...
متن کاملDerivation of regression models for pan evaporation estimation
Evaporation is an essential component of hydrological cycle. Several meteorologicalfactors play role in the amount of pan evaporation. These factors are often related to eachother. In this study, a multiple linear regression (MLR) in conjunction with PrincipalComponent Analysis (PCA) was used for modeling of pan evaporation. After thestandardization of the variables, independent components were...
متن کاملNon-Greedy L21-Norm Maximization for Principal Component Analysis
Principal Component Analysis (PCA) is one of the most important unsupervised methods to handle highdimensional data. However, due to the high computational complexity of its eigen decomposition solution, it hard to apply PCA to the large-scale data with high dimensionality. Meanwhile, the squared L2-norm based objective makes it sensitive to data outliers. In recent research, the L1-norm maximi...
متن کاملRepresenting Spectral data using LabPQR color space in comparison to PCA method
In many applications of color technology such as spectral color reproduction it is of interest to represent the spectral data with lower dimensions than spectral space’s dimensions. It is more than half of a century that Principal Component Analysis PCA method has been applied to find the number of independent basis vectors of spectral dataset and representing spectral reflectance with lower di...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1412.6752 شماره
صفحات -
تاریخ انتشار 2014